120 research outputs found

    Towards Building a Knowledge Base of Monetary Transactions from a News Collection

    Full text link
    We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge we face in this domain is that the same event is often reported multiple times, with varying correctness of details. We address this challenge by first collecting all information pertinent to a given event from the entire corpus, then considering all possible representations of the event, and finally, using a supervised learning method, to rank these representations by the associated confidence scores. A main innovative element of our approach is that it jointly extracts and stores all attributes of the event as a single representation (quintuple). Using a purpose-built test set we demonstrate that our supervised learning approach can achieve 25% improvement in F1-score over baseline methods that consider the earliest, the latest or the most frequent reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '17), 201

    Temporal Expertise Profiling

    Get PDF
    Abstract. We introduce the temporal expertise profiling task: identifying the skills and knowledge of an individual and tracking how they change over time. To be able to capture and distinguish meaningful changes, we propose the concept of a hierarchical expertise profile, where topical areas are organized in a taxonomy. Snapshots of hierarchical profiles are then taken at regular time intervals. Further, we develop methods for detecting and characterizing changes in a person’s profile, such as, switching the main field of research or narrowing/broadening the topics of research. Initial results demonstrate the potential of our approach.

    Report on the 44th European Conference on Information Retrieval (ECIR 2022): The First Major Hybrid IR Conference

    Get PDF
    The 44th European Conference on Information Retrieval (ECIR’22) was held in Stavanger, Norway. It represents a landmark, not only for being the northernmost ECIR ever, but also for being the first major IR conference in a hybrid format. This article reports on ECIR’22 from the organizers’ perspective, with a particular emphasis on elements of the hybrid setup, with the aim to serve as a reference and guidance for future hybrid conferences.publishedVersio

    Exploring Decomposition for Solving Pattern Mining Problems

    Get PDF
    This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluation show that the CBPM provides a reduction in both the runtime and memory usage. Also, CBPM based on the approximate strategy provides good accuracy, demonstrating its effectiveness and feasibility. Our GPU implementation achieves significant speedup of up to 552× on a single GPU using big transaction databases.publishedVersio

    Space-Efficient Support for Temporal Text Indexing in a Document Archive Context

    No full text
    Support for temporal text-containment queries (query for all versions of documents that contained one or more particular words at a particular time t) is of interest in a number of contexts, including web archives, in a smaller scale temporal XML/web warehouses, and temporal document database systems in general. In the V2 temporal document database system we employed a combination of full-text indexes and variants of time indexes to perform efficient textcontainment queries. That approach was optimized for moderately large temporal document databases. However, for "extremely large databases" the index space usage of the approach could be too large. In this paper, we present a more spaceefficient solution to the problem: the interval-based temporal text index (ITTX)
    • …
    corecore